A. The Provider Landscape

Choosing how to access LLMs

Agenda

  • A. Provider Landscape — Proprietary vs Open-Source vs Self-Hosted
  • B. API Anatomy — How HTTP calls to LLMs work
  • C. Security — Protecting your API keys
  • D. Wrap-up — Key takeaways

Model Access Decision Tree

graph TD
    A["Your Application"] --> B{"Model Access<br/>Strategy"}
    B --> C["Proprietary APIs"]
    B --> D["Open-Source APIs"]
    B --> E["Self-Hosted"]

    C --> C1["OpenAI / GPT-4"]
    C --> C2["Anthropic / Claude"]
    C --> C3["Google / Gemini"]

    D --> D1["Hugging Face"]
    D --> D2["Together AI"]
    D --> D3["Groq"]

    E --> E1["Ollama (Local)"]
    E --> E2["vLLM (Cloud GPU)"]
    E --> E3["AWS Bedrock"]

    style B fill:#FF7A5C,stroke:#1C355E,color:#fff
    style C fill:#1C355E,stroke:#1C355E,color:#fff
    style D fill:#00C9A7,stroke:#1C355E,color:#fff
    style E fill:#9B8EC0,stroke:#1C355E,color:#fff

Provider Comparison

Aspect Proprietary Open-Source APIs Self-Hosted
Cost $0.50–$60 / M tokens Free tiers available Hardware / GPU costs
Quality State-of-the-art Rapidly improving Same as open-source
Latency Low (optimized infra) Variable (free = slow) Depends on hardware
Privacy Data sent to provider Data sent to provider Full data control
Rate Limits Generous (paid) Restrictive (free) No limits (HW-bound)
Setup Minutes Minutes Hours to days

Why We Start With Hugging Face

For this course:

  1. Zero cost — No credit card required
  2. Thousands of models — Experiment freely
  3. Industry-standard — The “GitHub of ML”
  4. Transferable skills — Same patterns everywhere

In production you’ll mix providers:

  • Prototyping → Hugging Face (free)
  • Production quality → OpenAI / Anthropic
  • Privacy-sensitive → Self-hosted (Ollama, vLLM)

Design with provider abstraction from day one.

B. API Anatomy

How HTTP requests to LLMs work

The Request–Response Cycle

sequenceDiagram
    participant App as Your Application
    participant API as HF Inference API
    participant Model as Hosted Model

    App->>API: POST /models/{model_id}
    Note over App,API: Authorization: Bearer hf_xxx
    Note over App,API: Body: {"inputs": "your text"}
    API->>Model: Load model (if cold)
    Model->>API: Generate response
    API->>App: JSON response

Warm vs Cold Models

Warm = already in memory (seconds). Cold = needs loading first (20–60s on free tier). Popular models are almost always warm.

Request Structure

Visit Hugging Face Inference API Quicktour

import os
import requests

API_URL = "https://router.huggingface.co/v1/chat/completions"
headers = {"Authorization": f"Bearer {os.environ['HF_TOKEN']}"}
payload = {
    "messages": [
        {
            "role": "user",
            "content": "How many 'G's in 'huggingface'?"
        }
    ],
    "model": "openai/gpt-oss-120b:fastest",
}

response = requests.post(API_URL, headers=headers, json=payload)
print(response.json()["choices"][0]["message"])

Key API Parameters

Parameter What It Controls Recommended Values
temperature Randomness (0 = deterministic) 0.1–0.3 factual, 0.7–0.9 creative
max_new_tokens Response length limit Set based on expected output
top_p Nucleus sampling threshold 0.9 (default)
top_k Consider only top K tokens 50 (default)
repetition_penalty Avoid loops 1.1–1.3
return_full_text Include prompt in output? Usually False

C. Security First

Protecting your API keys

Never Hardcode Secrets

Rule #1

Never put API tokens directly in source code. Not even “just for testing.”

# ❌ NEVER do this
HEADERS = {"Authorization": "Bearer hf_abc123realtoken"}

# ✅ ALWAYS do this
import os
from dotenv import load_dotenv
load_dotenv()

token = os.getenv("HUGGINGFACE_API_TOKEN")

The .env Approach (Development)

Step 1: Create a .env file in your project root

# .env
HUGGINGFACE_API_TOKEN=hf_your_token_here

Step 2: Load it securely in Python

def get_api_token():
    """Retrieve API token with validation."""
    token = os.getenv("HUGGINGFACE_API_TOKEN")
    if not token:
        raise EnvironmentError(
            "HUGGINGFACE_API_TOKEN not found. "
            "Create a .env file or set the environment variable."
        )
    if not token.startswith("hf_"):
        raise ValueError("Invalid token format — should start with 'hf_'.")
    return token

Critical: .gitignore

Do This Immediately

Add .env to .gitignore before your first commit. One leaked token on GitHub = account compromise.

# .gitignore
.env
.env.local
*.env

Production secrets management:

Environment Tool
Development .env + python-dotenv
CI/CD GitHub Secrets, GitLab Variables
Production AWS Secrets Manager, HashiCorp Vault, Azure Key Vault

System Environment Variables (CI/CD)

# Linux / macOS
export HUGGINGFACE_API_TOKEN=hf_your_token_here

# Windows PowerShell
$env:HUGGINGFACE_API_TOKEN = "hf_your_token_here"

# Windows CMD
set HUGGINGFACE_API_TOKEN=hf_your_token_here

Why Not Just .env Everywhere?

Environment variables can leak through process listings and crash dumps. In production, use secrets managers that provide encryption, rotation, and audit logging.

D. Wrap-up

Key Takeaways

  1. Three access strategies: proprietary APIs, open-source APIs, self-hosted — choose based on cost, quality, latency, and privacy requirements.
  2. Hugging Face gives you free access to thousands of models — perfect for learning and prototyping.
  3. API calls are just HTTPPOST with a JSON body, parse the JSON response.
  4. Security is non-negotiable.env files for development, secrets managers for production, .gitignore always.
  5. Design for provider abstraction — your code should make switching providers easy.

Up Next

Lab 2: Build a production-grade API client — from “hello world” to retry logic and caching.